Platform Explorer / Nuxeo Platform 2023.10

Operation PDF.ExtractText (PDF: Extract Text)

Description

Extracts raw text from a PDF. If the PDF is encrypted, a password is required. pdfxpath is the xpath of the blob (default to file:content). The extracted text is set in the targetxpath property of the input document, which is saved if save is true. If patterntofind is not provided, extracts all the text it can, else it extracts only the line where the pattern is found. If patterntofind is provided and removepatternfromresult is true, the line is returned without the pattern.
Operation id PDF.ExtractText
Category Document
Label PDF: Extract Text
Requires
Since

Parameters

Name Description Type Required Default value
password string no  
patterntofind string no  
pdfxpath string no  
removepatternfromresult boolean no  
save boolean no  
targetxpath string no  

Signature

Inputs document, documents
Outputs document, documents

Implementation Information

Implementation Class Class: org.nuxeo.ecm.platform.pdf.operations.PDFExtractTextOperation
Contributing Component org.nuxeo.ecm.platform.pdf.operations

JSON Definition

{
  "id" : "PDF.ExtractText",
  "label" : "PDF: Extract Text",
  "category" : "Document",
  "requires" : null,
  "description" : "Extracts raw text from a PDF. If the PDF is encrypted, a password is required. pdfxpath is the xpath of the blob (default to file:content). The extracted text is set in the targetxpath property of the input document, which is saved if save is true. If patterntofind is not provided, extracts all the text it can, else it extracts only the line where the pattern is found. If patterntofind is provided and removepatternfromresult is true, the line is returned without the pattern.",
  "url" : "PDF.ExtractText",
  "signature" : [ "document", "document", "documents", "documents" ],
  "params" : [ {
    "name" : "password",
    "description" : null,
    "type" : "string",
    "required" : false,
    "widget" : null,
    "order" : 0,
    "values" : [ ]
  }, {
    "name" : "patterntofind",
    "description" : null,
    "type" : "string",
    "required" : false,
    "widget" : null,
    "order" : 0,
    "values" : [ ]
  }, {
    "name" : "pdfxpath",
    "description" : null,
    "type" : "string",
    "required" : false,
    "widget" : null,
    "order" : 0,
    "values" : [ ]
  }, {
    "name" : "removepatternfromresult",
    "description" : null,
    "type" : "boolean",
    "required" : false,
    "widget" : null,
    "order" : 0,
    "values" : [ ]
  }, {
    "name" : "save",
    "description" : null,
    "type" : "boolean",
    "required" : false,
    "widget" : null,
    "order" : 0,
    "values" : [ ]
  }, {
    "name" : "targetxpath",
    "description" : null,
    "type" : "string",
    "required" : false,
    "widget" : null,
    "order" : 0,
    "values" : [ ]
  } ]
}